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DETAILED ACTION 
Response to Arguments 

1 . The claims were objected to because of the misuse of the term "voice 
recognition" for what nowadays is called -speech recognition-. The Applicant argues 
that the term "voice" is intended to include more than just "speech" such as the 
identification of the speaker and recognition of words and sounds. The Applicant cites 
pages 23-25 and Fig. 8 as support for this claim. The Examiner respectfully disagrees. 
Figure 8 and pages 23-25 of the specification are drawn to a voice (speech) recognition 
system that performs an A/D conversion on the input (21), extracts characteristics from 
the speech such as Mel Cepstrum Coefficients, LPC's, power, pitch information, etc. 
(page 24, lines 3-19), and then the matching unit performs recognition using the 
extracted characteristics based on HMM, acoustic models, dictionaries and grammars 
(page 24, line 20 to page 25, line 25). Nowhere is the term "voice recognition" defined 
as the identification the individual speaker, recognition of meanings and recognition of 
sounds. Additionally, the specification fails to exhibit any support for the identification of 
the speaker such as comparing the processed speech information to specific user voice 
templates. For at least these reasons the objection to the claims is proper and the 
objection stands. 

2. Applicant's arguments with respect to claims 1 , 3-5 and 8-1 1 have been 
considered but are moot in view of the new ground(s) of rejection. 
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Claim Objections 

The disclosure and claims are objected to because the term "voice recognition" is 
misused for what nowadays is called -speech recognition- in the speech signal 
processing art. While "voice recognition" and "speech recognition" were both once used 
interchangeably to refer to spoken word recognition, nowadays these two terms are 
distinguished. The term "voice recognition" now denotes identification of who is doing 
the speaking (class 704/246), while "speech recognition" (or "word recognition") 
denotes identification of what is being said (class 704/251). So, appropriate correction 
to the proper terms of art is required. 

Claim Rejections - 35 USC § 103 

3. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

4. Claims 1, 3-5 and 8-11 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Kamiya et al. (U.S. Pat. 6,629,242) in view of Petrushin (U.S. Pat. 
2002/01 94002A1). 

As per claims 1, 10 and 11, Kamiya teaches a speech processing device, 
method and recording medium executing a program built into a robot (col. 3, lines 63- 
67), comprising: 
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control means for controlling speech processing by said speech processing 
means (neural network used to determine the emotion from the speech is adapted, coL 
8, lines 34-60 and robot's speech is adapted, col. 10, line 66 to col. 11, line 4), based on 
a state of said robot; wherein the state is determined by an action, an emotion state and 
an instinct state of the robot (adaptation of the neural network is subjected to an 
evaluation of a decided action (instinct) that is based on its current emotion state and 
how this action is judged by the user, col. 8, lines 34-60); 

wherein said emotion and instinct states are determined on the basis of values 
corresponding to a plurality of states of an emotion model and an instinct model, 
respectively; wherein the value corresponding to each state within the emotion model 
and within the instinct model are linked to a mutually stimulating manner (relationships 
exist between emotional model and corresponding patterns of behavior, col. 1 1 , lines 8- 
23); 

wherein said voice processing means comprises voice recognizing means for 
recognizing the voice input (sound/voice detection unit, col. 6, lines 9-15); and 

wherein said robot takes actions corresponding to a reliability of the voice 
recognition results output from said voice recognizing means, or the emotion state of 
said robot is changed based on said reliability (performs speech recognition which 
would inherently have an acoustic model that would choose the corresponding meaning 
with the highest probability hence the most reliable, col. 6, lines 9-15). 
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Kamiya does not specifically teach the speech processing means for processing 
a speech input including extracting control pitch information or phonemics information 
and changing the state based on this information. 

Petrushin teaches a system for detecting emotion in speech that extracts the 
pitch from the incoming speech to classify the emotion of the speaker (paragraph 42). 

It would have been obvious to one of ordinary skill in the art at the time of 
invention to modify the system of Kamiya to extract control pitch information and change 
the state based on this information because, as taught by Petrushin, pitch is the main 
vocal cue for emotion recognition (paragraph 48). 

5. As per claim 3, Kamiya teaches wherein said voice processing means comprises 
voice synthesizing means for performing voice synthesizing processing and outputting 
synthesized sound (outputs sounds or responses, col. 10, line 66 to col. 11, line 4); 

and wherein said control means control the voice synthesizing processing by 
said voice synthesizing means, based on the state of said robot (outputs the voice 
suitable for the current pseudoemotion, col. 1 0, line 66 to col. 11, line 4). 

6. As per claim 4, Kamiya teaches wherein said control means control phonemics 
information and pitch information output by said voice synthesizing means (synthesizes 
and answer hence each answer would have different phonemes and because pitch is 
the main vocal cue in emotion the voice for the current pseudoemotion would have a 
controlled pitch, col. 10, line 66 to col. 1 1 , line 4). 

7. As per claim 5, Kamiya teaches wherein said control means control the speech 
speed or volume of synthesized sound output by said voice synthesizing means 
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(outputs the voice suitable for the current pseudoemotion wherein speech of different 
emotions would inherently have different speeds and volumes, col. 10, line 66 to col. 11, 
line 4). 

8. As per claim 8, Kamiya teaches wherein said control means recognizes the 
action which said robot is taking, and controls voice processing by said voice 
processing means based on the load regarding that action (stores relationships 
between emotion and behavior and outputs speech in regard to the current emotion, 
col. 1 0, line 66 to col. 1 1 , line 23). 

9. As per claim 9, Kamiya teaches wherein said robot takes actions corresponding 
to resources which can be appropriated to voice processing by said voice processing 
means (inherently would retrieve phonetic information to synthesize the speech, col. 10, 
line 66 to col. 11, line 4) 

Conclusion 

10. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. Gabai et al. (U.S. Pat. 6,160,986) and Wang (U.S. Pat. 
6,192,215) teach toys that contain speech recognition for interaction. Fujimura et al. 
(U.S. Pat. 6,792,406), filed after the current application, teaches using emotion in 
speech to control an interactive electronic pet that is suggested to be used in a 
mechanical pet. 



Application/Control Number: 09/723,813 Page 7 

Art Unit: 2655 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Matthew j. Sked whose telephone number is (571) 272- 
7627. The examiner can normally be reached on Mon-Fri (8:00 am - 4:30 pm). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Wayne Young can be reached on 571-272-7582. The fax phone number for 
the organization where this application or proceeding is assigned is 571-273-8300. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). / 
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